## Modeling and insights {: #modeling-and-insights }

DataRobot automates many parts of the modeling pipeline, including processing and partitioning the dataset, as described [here](model-data). This document starts with the visualizations available once modeling has started.

### Exploratory Data Analysis (EDA) {: #exploratory-data-analysis-eda }

Navigate to the **Data** tab to learn more about your data&mdash;summary statistics based on sampled data known as [EDA](eda-explained). Click each feature to see a variety of information, including a [histogram](histogram) that represents the relationship of the feature with the target.

![](images/aml-biz-10.png)

### Feature Associations {: #feature-associations }

While DataRobot is running Autopilot to find the champion model, use the [**Data > Feature Associations**](feature-assoc) tab to view the feature association matrix and understand the correlations between each pair of input features. For example, the features `nbrPurchases90d` and `nbrDistinctMerch90d` (top-left corner) have strong associations and are, therefore, ‘clustered’ together (where each color block in this matrix is a cluster).

![](images/aml-biz-1.png)

DataRobot provides a variety of insights to [interpret results](#interpret-results) and [evaluate accuracy](#evaluate-accuracy).

### Leaderboard {: #leaderboard }

After Autopilot completes, the Leaderboard ranks each model based on the selected optimization metrics (LogLoss in this case). 

The outcome of Autopilot is not only a selection of best-suited models, but also the identification of a recommended model&mdash;the model that best understands how to predict the target feature `SAR`. Choosing the best model is a balance of accuracy, metric performance, and model simplicity. See the [model recommendation process](model-rec-process) description for more detail.

Autopilot will continue building models until it selects the best predictive model for the specified target feature. This model is at the top of the Leaderboard, marked with the **Recommended for Deployment** badge.

![](images/aml-biz-2.png)

To reduce false positives, you can choose other metrics like Gini Norm to sort the Leaderboard based on how good the models are at giving SAR a higher rank than the non-SAR alerts.

### Interpret results {: #interpret-results }

There are many visualizations within DataRobot that provide insight into why an alert might be SAR. Below are the most relevant for this use case.

#### Blueprint {: #blueprint }

Click on a model to reveal the model [blueprint](blueprints)&mdash;the pipeline of preprocessing steps, modeling algorithms, and post-processing steps used to create the model.

![](images/aml-biz-11.png)

#### Feature Impact {: #feature-impact }

[**Feature Impact**](feature-impact) reveals the association between each feature and the target. DataRobot identifies the top three most impactful features (which enable the machine to differentiate SAR from non-SAR alerts) as `total merchant credit in the last 90 days`, `number refund requests by the customer in the last 90 days`, and `total refund amount in the last 90 days`.

![](images/aml-biz-3.png)

#### Feature Effects {: #feature-effects }

To understand the direction of impact and the SAR risk at different levels of the input feature, DataRobot provides partial dependence graphs (within the [**Feature Effects**](feature-effects) tab) to depict how the likelihood of being a SAR changes when the input feature takes different values. In this example, the total merchant credit amount in the last 90 days is the most impactful feature, but the SAR risk is not linearly increasing when the amount increases.

* When the amount is below $1000, the SAR risk remains relatively low.
* SAR risk surges significantly when the amount is above $1000. 
* SAR risk increase slows when the amount approaches $1500.
* SAR risk tilts again until it hits the peak and plateaus out at around $2200. 

![](images/aml-biz-4.png)


The partial dependence graph makes it very straightforward to interpret the SAR risk at different levels of the input features. This could also be converted to a data-driven framework to set up risk-based thresholds that augment the traditional rule-based system.

#### Prediction Explanations {: #prediction-explanations }

To turn the machine-made decisions into human-interpretable rationale, DataRobot provides [**Prediction Explanations**](pred-explain/index) for each alert scored and prioritized by the machine learning model. In the example below, the record with `ID=1269` has a very high likelihood of being a suspicious activity (prediction=90.2%), and the three main reasons are:

* Total merchant credit amount in the last 90 days is significantly greater than the others.
* Total spend in the last 90 days is much higher than average.
* Total payment amount in the last 90 days is much higher than average.

![](images/aml-biz-5.png)

**Prediction Explanations** can also be used to cluster alerts into subgroups with different types of transactional behaviors, which could help triage alerts to different investigation approaches.

#### Word Cloud {: #word-cloud }

The [**Word Cloud**](word-cloud) allows you to explore how text fields affect predictions. The Word Cloud uses a color spectrum to indicate the word's impact on the prediction. In this example, red words indicate the alert is more likely to be associated with a SAR.

![](images/aml-biz-12.png)


### Evaluate accuracy {: #evaluate-accuracy }

The following insights help evaluate accuracy.

#### Lift Chart {: #lift-chart }

The [**Lift Chart**](lift-chart) shows how effective the model is at separating the SAR and non-SAR alerts. After an alert in the out-of-sample partition gets scored by the model, it is assigned a risk score that measures the likelihood of the alert being a SAR risk or becoming a SAR. In the **Lift Chart**, alerts are sorted based on the SAR risk, broken down into 10 deciles, and displayed from lowest to the highest. For each decile, DataRobot computes the average predicted SAR risk (blue plus) as well as the average actual SAR event (orange circle) and depicts the two lines together. For the champion model built for this false positive reduction use case, the SAR rate of the top decile is 55%, which is a significant lift from the ~10% SAR rate in the training data. The top three deciles capture almost all SARs, which means that the 70% of alerts with very low predicted SAR risk rarely result in SAR.

![](images/aml-biz-6.png)

#### ROC Curve {: #roc-curve }

Once you know the model is performing well, you select an explicit threshold to make a binary decision based on the continuous SAR risk predicted by DataRobot. The [**ROC Curve**](roc-curve-tab-use) tools provide a variety of information to help make some of the important decisions in selecting the optimal threshold:

* The false negative rate has to be as small as possible. False negatives are the alerts that DataRobot determines are not SARs which then turn out to be true SARs. Missing a true SAR is very dangerous and would potentially result in an MRA (matter requiring attention) or regulatory fine. 

	This case takes a conservative approach. To have a false negative rate of 0, the threshold has to be low enough to capture all the SARs.

* Keep the alert volume as low as possible to reduce enough false positives. In this context, all alerts generated in the past that are not SARs are the de-facto false positives. The machine learning model is likely to assign a lower score to those non-SAR alerts; therefore, pick a high-enough threshold to reduce as many false positive alerts as possible.

* Ensure the selected threshold is not only working on the seen data, but also on the unseen data, so that when the model gets deployed to the transaction monitoring system for ongoing scoring, it could still reduce false positives without missing any SARs.

Different choices of thresholds using the cross-validation data (the data used for model training and validation) determines that `0.03` is the optimal threshold since it satisfies the first two criteria. On the one hand, the false negative rate is 0; on the other hand, the alert volume is reduced from `8000` to `2142`, reducing false positive alerts by 73% (`5858/8000`) without missing any SARs.

![](images/aml-biz-7.png)

For the third criterion&mdash;does the threshold also work on the unseen alert&mdash;you can quickly validate it in DataRobot. By changing the data selection to Holdout and applying the same threshold (`0.03`), the false negative rate remains 0, and the false positive reduction rate remains at 73% (`1457/2000`). This proves that the model generalizes well and will perform as expected on unseen data.

#### Payoff matrix {: #payoff-matrix}

From the **Profit Curve** tab, use the [**Payoff Matrix**](profit-curve) to set thresholds based on simulated profit. If the bank has a specific risk tolerance for missing a small portion of historical SAR, they can also apply the **Payoff Matrix** to pick up the optimal threshold for the binary cutoff. For example: 

Field | Example | Description
----- | ------- | -----------
False Negative | 	TP=`-$200` | Reflects the cost of remediating a SAR that was not detected.
False Positive	| FP=`-$50` |	Reflects the cost of investigating an alert that proved a "false alarm."
Metrics	| False Positive Rate, False Negative Rate, and Average Profit | Provides standard statistics to help describe model performance at the selected display threshold.

By setting the cost per false positive to `$50` (cost of investigating an alert) and the cost per false negative to `$200` (cost of remediating a SAR that was not detected), the threshold is optimized at `0.1183` which gives a minimum cost of `$53k ($6.6 * 8000)` out of 8000 alerts and the highest ROI of `$347k ($50 * 8000 - $53k)`.

On the one hand, the false negative rate remains low (only 5 SARs were not detected); on the other hand, the alert volume is reduced from 8000 to 1988, meaning the number of investigations is reduced by more than 75% (6012/8000).

The threshold is optimized at `0.0619`, which gives the highest ROI of $300k out of 8000 alerts. By setting this threshold, the bank will reduce false positives by 74.3% (`5940/8000`) at the risk of missing only 3 SARs.

![](images/aml-biz-8.png)

See the [deep dive](#deep-dive-imbalanced-targets) for information on handling class imbalance problems.


### Post-processing {: #post-processing }

Once the modeling team decides on the champion model, they can download [compliance documentation](compliance/index) for the model. The resulting Microsoft Word document provides a 360-degree view of the entire model-building process, as well as all the challenger models that are compared to the champion model. Most of the machine learning models used for the Financial Crime Compliance domain require approval from the Model Risk Management (MRM) team. The compliance document provides comprehensive evidence and rationale for each step in the model development process.

![](images/aml-biz-9.png)

